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SHOULD GANZFELD RESEARCH CONTINUE 
TO BE CRUCIAL IN THE SEARCH FOR A 
REPLICABLE PSI EFFECT? PART I. DISCUSSION 
PAPER AND INTRODUCTIONTO AN 
ELECTRONIC-MAIL DISCUSSION 


By Julie Milton 


ABSTRACT: A group of recent, well-controlled ganzfeld studies failed to replicate the 
positive findings of earlier work (Milton & Wiseman, 1999a). This presents a challenge to 
claims that a ganzfeld psi effect can be replicated across experimenters under 
methodologically stringent conditions. Because of the ganzfeld's history as a focus for 
proof-oriented questions, this situation has implications for parapsychology as a whole. In 
this paper, it is shown that replication of effect size in the recent ganzfeld studies is not 
demonstrated across experimenters, regardless of whether the database is updated to 
include recent studies or whether outcome and cumulation statistics different from those 
preplanned are applied. Problems with interpreting as strong evidence for psi other 
parapsychological meta-analyses of less clearly well-conducted studies and apparently 
consistent process-oriented findings are discussed. The case is made for continuing with 
ganzfeld research as an important focus of parapsychology’s claims for replicability. It is 
argued that if there is a replicable ganzfeld psi effect, however, the procedures necessary to 
produce it have not yet been identified. It is proposed that process-oriented work be 
directed to the goal of identifying which studies should be able to replicate an above-chance 
effect, and that these studies, identified by their planned procedures before they have been 
conducted, should provide the basis for future tests of replication. 

The organization of an international, electronic-mail discussion of these issues among 
41 researchers with a special interest in ganzfeld research is described. The edited 
transcript of the discussion is presented in Part II. 


Discussion Paper 

Despite the field’s long history, there is still controversy over whether 
the results of parapsychology experiments offer evidence for a genuine 
communication anomaly — psi. For some time, parapsychologists have 
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the debate and to Bob Morris for comments on an earlier draft of the discussion paper. 
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recognized that the evidence for psi most likely to convince fair-minded 
but critical scientists would be an experimental procedure that a range of 
experimenters could carry out that would produce reasonably replicable 
effects. Unless the experiment’s effects could be replicated across experi- 
menters, there would always remain fraud, error, or sensory leakage as 
strong alternative explanations to the psi hypothesis. 

For many years, such replicability appeared to be out of reach. This 
perception appeared to change however, with the arrival in the 1970s of 
several research programs involving free-response ESP. In particular, 
ganzfeld ESP studies seemed especially promising. Not only did a range 
of experimenters appear to obtain outcomes in ganzfeld studies that 
were above chance, but they did so under conditions that appeared to be 
well-controlled and without using specially selected participants. In 1981 , 
Ray Hyman, a psychologist skeptical of the existence of psi, wanted to 
conduct a critical assessment of a research program that represented 
parapsychology’s strongest evidence. Because of claims then being made 
for ganzfeld research, it was an obvious choice for his attention (Hyman, 
1985) . Hyman (1985) meta-analyzed the 42 studies conducted since pub- 
lication of the first ganzeld ESP study in 1974, finding an overall statisti- 
cally significant outcome; however, he concluded that the methodologi- 
cal problems that he identified in the Studies could account for the 
positive results. In response, Charles Honorton, a proponent of ganzfeld 
research, conducted his own meta-analysis of the database, restricting his 
attention to the 28 studies reporting direct hits as an outcome measure 
(Honorton, 1985) . He also obtained a statistically significant overall out- 
come (see Table 1) ; but although he conceded that the studies contained 
potential methodological problems, he did not agree that the problems 
were sufficient to account for the overall outcome. 

Rather than continue to dispute the matter, Hyman and Honorton 
(1986) instead jointly drew up a set of methodological guidelines for the 
stringent conduct of future ganzfeld studies, agreeing that the case for 
psi in the ganzfeld would rely on a broad range of experimenters obtain- 
ing positive results under such conditions. Meanwhile, Honorton and his 
research team at Princeton Research Laboratories (PRL) had begun in 
1982 a series of partially automated ganzfeld studies — autoganzfeld stud- 
ies — designed to meet Hyman’s methodological concerns (Bern & 
Honorton, 1994; Honorton et al., 1990). Before PRL closed in 1989, 
eleven series were completed, obtaining a statistically significant overall 
outcome and a mean effect size nearly identical to that obtained in 
Honorton’s (1985) meta-analysis of the earlier ganzfeld database (see Ta- 
ble 1). Replication under stringent conditions of the early ganzfeld re- 
sults appeared to suggest that methodological problems were unlikely to 
have accounted entirely for the effects obtained in the earlier studies; 
however, Bern and Honorton pointed out that it still remained for their 
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results to be replicated by other experimenters under similarly stringent 
conditions. 

In early 1997, Richard Wiseman and I, in an attempt to determine 
whether other experimenters had indeed succeeded in replicating these 
results under well-controlled conditions, meta-analyzed the 30 published 
ganzfeld studies conducted since the publication of Hyman and 
Honorton’s methodological guidelines (Milton & Wiseman, 1999a) . The 
studies’ combined outcome was not statistically significant, and the mean 
effect size was near zero (see Table 1) . The mean effect size in the recent 
studies is less than a seventeenth of that found in the PRL work, and a 
post hoc comparison shows that it is statistically significantly lower than 
the mean effect sizes of the PRL and earlier ganzfeld databases (see Table 
A2). 

Updating our meta-analysis to include the studies (see Table Al) 
published to date (March 1999) since our meta-analysis was completed in 
February 1997 renders the overall cumulation statistically significant, 1 
but fails to raise the mean effect size to even a sixth of that obtained in the 
PRL or earlier ganzfeld studies meta-analyzed by Hyman (1985) and 
Honorton (1985) (see Table 1). Moreover, the statistical significance of 
the updated cumulation is due not to renewed success by a range of inves- 
tigators, but solely to the inclusion of an extremely successful study by 
Dalton (1997a) (see Table 1) . Whether Dalton’s study is included or not, 
it is clear that the effect size obtained in Honorton’s autoganzfeld studies 
and in the earlier ganzfeld database has not replicated. Post hoc compari- 
sons show that the updated database of recent studies, with or without the 
Dalton study, has a mean effect size statistically signficantly lower than 
those of the earlier meta-analyses (see Table A2) . 

The same is true if a variety of alternative outcome calculation and 
cumulation methods are used to analyse the recent studies rather than 
the ones that we preplanned and applied (Milton & Wiseman, 1997a). 
Since the presentation of our meta-analysis at the 1997 Parapsychological 
Association Annual Convention, a number of colleagues have informally 
pointed out that using several different methods of calculating or cumu- 
lating individual study outcomes, or introducing various criteria for ex- 
cluding outliers, results in overall statistical significance of varying de- 
grees for the database. Regardless of arguments over the post hoc and 
possibly selective nature of these analyses, none of them has the effect of 
raising the mean effect size in the new database by any meaningful 
amount, because of the relative insensitivity of means compared to the 
statistical significance of cumulations when slight changes are made in 


1 It was not possible to calculate outcomes for three of the studies (see footnotes to Ta- 
ble Al) but given that one of these studies (Parker & Westerlund, 1998, Serial Ganzfeld) is 
clearly slightly below chance and the remaining two studies are very small with only 1 2 trials 
each, it is unlikely that their results would increase the cumulated outcome of the database 
by a meaningful amount. 



Table 1 

Outcomes of meta- analyses of ESP ganzfeld studies 


Effect size* 


Meta-analysis 

Number 
of studies 

Number 
of trials 

Stouffer z 

P 

(1-t) 

Mean 

Standard 

deviation 

95% confidence 
interval 

Honorton(1985) b 

28 

835 

6.60 

2.2x1 O’ 11 

.26 

.38 

.12 to .40 

Bern & Honorton 

11 

329 

3.41 

.00033 

.23 

.24 

.9 to .37 

(1994) 

Milton Sc Wiseman 

30 

1198 

.70 

.24 

.13 

.23 

-.7 to .10 

(1999a) 

All studies 1987 

39 

1588 

2.28 

.011 

.38 

.26 

-.4 to .12 

to present' 

All studies 1987 

38 

1460 

1.45 

.074 

.27 

.25 

-.5 to .11 


to present excluding 
Dalton ( 1997a) c 


* Effect size is z/N 1/2 

b Here Honorton’s meta-analysis solely represents the early ganzfeld database because Hyman’s (1985) report 
does not provide the number of trials in each study needed for the calculation of z/N 1/2 , the effect size used in 
this table. 

c Individual study outcomes were calculated following the same procedures as in Milton and Wiseman (1999a). 
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the treatment of a database. For example, using Bern and Honorton’s 
(1994) method to sum the number of direct hits obtained across studies 
(approximating the number of direct hits from the standard normal devi- 
ate of the study’s reported outcome measure if direct hits were not re- 
ported) results in a total of 331 hits in the 1198 trials in the database. This 
is a statistically significant outcome, p= .19, one-tailed; but the effect size 
measured in this way is only .60. 

Implications of the current situation 

The current situation, then, is that the studies that appear to form 
the group proposed by Bern and Honorton (1994) to form a crucial test 
of the evidence for psi in the ganzfeld have clearly failed to show replica- 
tion of an above-chance effect across experimenters; and, to date, they 
only show overall statistical significance if one extremely successful study 
is included. On the face of it, this appears to be an important replication 
failure because the unique history of ganzfeld research — strong claim, 
critical assessment, methodological guidelines, methodological refine- 
ment, initial replication — has led to it being presented to mainstream sci- 
ence as a critical test of the evidence for psi. 

It has been almost 20 years, however, since Hyman’s (1985) 
meta-analysis placed the focus for assessing the evidence for psi on 
ganzfeld research. Since that time, meta-analyses have been conducted of 
other parapsychological databases, including some whose main purpose 
has been to examine process-oriented hypotheses. The studies within 
them are not as well-controlled as the recent ganzfeld studies appear to 
be; but their highly statistically significant cumulated outcomes, their ap- 
parent resistance to explanations in terms of selective reporting, their 
general lack of statistically significant correlations between individual 
studies’ quality and effect size in these databases, and the apparent 
replicability across experimenters of successful studies within them has 
led to their being presented both within and outside parapsychology as 
providing strong evidence that psi is a genuine communication anomaly 
that replicates across experimenters (e.g. Honorton & Ferrari, 1989; 
Radin, 1997; Radin & Ferrari, 1991; Radin & Nelson, 1989; Utts, 1991). If 
they do indeed constitute strong evidence, then the replication failure of 
the recent ganzfeld studies requires no negative reassessment of the 
claims for psi nor any action to continue to seek evidence for 
across-experimenter replication of a psi effect under stringent conditions 
in the ganzfeld. 
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Table 2 

Mean Methodological Quality of Studies in Parapsychology 
Meta- analyses Expressed as a Percentage of the Maximum 
Number of Quality Points Available. 


Meta-analysis 

Effect examined Mean quality (%) 

Honorton (1985) 

Ganzfeld ESP 

70" 

Hyman(1985) 

Ganzfeld ESP 

44 b 

Honorton & 

Forced-choice precognition 

41 

Ferrari (1989) 



Honorton et al. 

ESP-extraversion relationship 


(1998) 

Forced-choice studies 

45 


Free-response studies 

86 

Lawrence(1993) 

ESP-belief in psi relationship 

46 

Milton (1997) 

Non-ASC free-response ESP 



GESP studies 0 

61 


Clairvoyance studies 0 

58 


Precognition studies 0 

47 

Radin & Ferrari(1991) 

Dice PK Not reported 

Radin & Nelson(1989) 

Micro-PK Not reported 

Stanford & Stein (1994) 

ESP-Hypnosis relationship 

49 

Steinkamp et al.(1998) 

Precognition vs clairvoyance 



Clairvoyance studies 

66 


Precognition studies 

63 


Note: The meta-analyses used different quality criteria, ranging from 2 to 18 safeguards being 
examined in each meta-analysis. The mean quality of each meta-analysis is therefore, not di- 
recdy comparable with another. 

a In this meta-analysis, Honorton assessed study quality on just two features — the availability of 
sensory cues from target handling and the adequacy of the target randomization method. He 
assigned partial credit to studies containing methodological features (the use of single rather 
than duplicate target sets and randomization using hand shuffling, coin-flipping or 
die-throwing) that have received no credit in other parapsychological meta-analyses 
(Honorton & Ferrari, 1989; Lawrence, 1993; Milton, 1997; etc.) . This method allowed him to 
make a distinction between these studies and studies using less stringent or unknown meth- 
ods; but for the purposes of this table, the method arguably inflates apparent study quality by a 
considerable amount. For example, all but one study received at least one quality point for 
preventing sensory cueing regardless of whether a duplicate target set was used. If quality 
points are assigned in a manner more consistent with the other meta-analyses, with one point 
for the use of duplicate judging sets and no points for manual methods of randomization, the 
studies obtained 46% of the maximum available quality points. 

b Based on only 4 of Hyman’s 12 flaw categories. One of the excluded categories involved as- 
signing a flaw to studies in which it was not clear that receivers’ friends were used as senders. 
This does not seem appropriate because it is absence of appropriate security rather than the 
relationship between participants that would constitute an inadequate precaution against col- 
lusion. The remaining 7 flaws concerned statistical errors and the use of multiple outcome 
measures without adjustment for multiple analysis. They could not have affected study out- 
comes in the meta-analysis because Hyman calculated outcomes using appropriate statistics 
and single measures and are not therefore included here. 

c The original paper reports these percentages in terms of publication type rather than study 
type. 
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Problems in interpreting meta-analyses of studies of uncertain quality 

Even if internal analyses reveal no obvious problems, there are diffi- 
culties in interpreting meta-analyses as strong evidence for a phenome- 
non if the studies they contain are of uncertain or low methodological 
quality. As can be seen in Table 2, the parapsychological databases exam- 
ined so far consist of exactly such studies. The table summarizes the 
methodological quality observed in the major parapsychology databases 
meta-analyzed so far that have included individual study quality assess- 
ments. Setting aside Honorton’s (1985) and Hyman’s (1985) quality as- 
sessments of the early ganzfeld work, which present some problems of in- 
terpretation (see footnotes to the table) , it can be seen that in fully half of 
the databases that reported mean study quality, studies scored on average 
fewer than half of the available methodological quality points. Only the 
14-study free-response sub-database in Honorton, Ferrari, and Bern’s 
(1998) meta-analysis contained studies that scored more than two-thirds 
of the available quality points; and it can be argued that omitted from 
that quality assessment were important quality criteria, such as the 
prespecification of sample size, the use of blind mentation transcription, 
the prevention of cues to judges from judging trials out of order, and so 
on (see Milton, 1997) . Two meta-analyses did not report mean study qual- 
ity at all. 

The lack of evidence that these databases in general consist of high 
quality studies introduces the possibility that their outcomes may have 
been inflated or, at worst, entirely caused by methodological flaws. To be 
a matter for concern in parapsychology databases, the effect sizes due to 
methodological flaws would have to be at least as large as the observed ef- 
fect sizes, and the flaws would have to be present in sufficient quantities 
(singly or in combination) to be relevant. There has been, however, very 
little empirical research to determine the effect sizes associated with the 
absence of the various methodological safeguards used in parapsychol- 
ogy (Milton & Wiseman, 1997b), and many meta-analyses do not report 
the frequency with which individual safeguards are not reported. It is, 
therefore, difficult to rule out methodological problems as an explana- 
tion for the observed results. There are, in fact, meta-analyses in which 
flaws likely to be associated with effect sizes not much, if any, smaller than 
those observed appear to be potentially prevalent. For example, it is clear 
that if the experimenter does not prespecify which of several possible 
measures (such as direct hits, ranks, etc. in free-response ESP studies) is 
to be used to test the null hypothesis, there is a potential to inflate study 
outcomes considerably, due to post hoc data selection. The effect Size as- 
sociated with such selection has not been calculated; however, a com- 
puter simulation by Hyman (1985) of the effects of being free to choose 
any of the four main outcome measures available when target ratings are 
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used suggests that the probability of any one of them being statistically 
significant with an alpha of .05 is approximately .15. In a database of 78 
free-response studies (Milton, 1997) , the observed probability of a study 
being statistically significantly above chance was .22, and 96% of studies 
did not report whether the choice of outcome measure was preplanned. 
Hyman’s study is likely to provide an extreme upper limit for the action 
of this particular flaw because it is not probable that post hoc selection of 
statistically significant outcome measures happens in every study, as it did 
in his simulation. Nevertheless, the potential effects of not prespecifying 
outcome measures is clearly not trivial in comparison with the outcomes 
of ESP studies. Similarly, recording errors have been estimated empiri- 
cally to occur on approximately 1% of trials and to be biased in favor of 
the observer’s hypothesis on two-thirds of the trials (Rosenthal, 1978). 
The mean effect size in Honorton and Ferrari’s (1989) database of 
forced-choice precognition studies is equivalent to raising a study’s out- 
come 1% above a mean chance expectation of 50%; but the frequency 
with which studies reported double-blind, double-checked, or automated 
data recording is not reported. 

In most parapsychological meta-analyses, estimates of overall study 
quality do not correlate statistically significantly with effect size. A num- 
ber of the researchers who obtained such null correlations have con- 
cluded that methodological problems, therefore, had no meaningful in- 
fluence on their databases (e.g., Honorton & Ferrari, 1989; Lawrence, 
1993; Radin & Ferrari, 1991; Radin & Nelson, 1989); however, in data- 
bases that do not consist entirely or mostly of clearly well-controlled stud- 
ies such as the parapsychology databases , there are many ways in which a 
relationship between methodological flaws and effect size could be ob- 
scured. This is a general problem in meta-analysis and not one restricted 
to parapsychology. Because these problems have received little attention 
in parapsychology (although see Hyman, 1985; Milton, 1997; Stanford & 
Stein, 1994) , it is worth listing some of them. A selection, by no means ex- 
haustive, is as follows: 

1. The absence of safeguards for certain procedures (such as ran- 
domization or sensory-shielding procedures) might inflate effect size 
more than the absence of safeguards for others (such as lack of dou- 
ble-blind checking of data records). In an unweighted correlation of 
study quality and effect size, the effect of the absence of these more im- 
portant safeguards might be drowned out by the other data (Stanford & 
Stein, 1994) . In some cases, experts have been called upon to rate flaws in 
terms of their likely impact so that a weighted correlation can be per- 
formed between the absence of safeguards and effect size (e.g., Milton, 
1997; Radin & Ferrari, 1991). Thus far, these weightings have not indi- 
cated any such relationships, but it could be argued that, given the gen- 
eral lack of direct empirical evidence concerning effect sizes that result 
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from the absence of safeguards, the experts’ judgments may be wrong, re- 
gardless of how well they agree with each other. 

2. It is unlikely that individual studies’ methodological quality is accu- 
rately reflected by their quality coding. Most parapsychological studies, 
especially those conducted before the 1 980s, have not been written with a 
future meta-analyst’s quality checklist in mind; and it is often unclear 
from reports whether particular safeguards against sensory leakage, er- 
ror, post hoc data selection, and so on have been carried out. Presented 
with unclear or circumstantial evidence concerning the presence of a 
safeguard, coders will have to make a subjective judgment according to 
this partial, ambiguous information, influenced by their individual ex- 
pectations and assumptions about what experimenters are likely to do as 
a matter of standard laboratory procedure. Under these circumstances, 
errors in coding are very likely to arise. 

3. The binary coding of methodological safeguards as either present 
or absent in almost all parapsychology meta-analyses to date means that 
studies whose use of the safeguard is unknown must be included in the 
“safeguard present” or “safeguard absent” group. For example, it may be 
assumed that studies whose reports do not address at all the issue of study 
size belong with studies that clearly did not prespecify, as a safeguard 
against optional stopping, the number of trials to be conducted (Milton 
& Wiseman, 1997b). However, given that at least some, but by no means 
all, experimenters are likely to have used the safeguard without reporting 
it, this will result in a group of studies that all used the safeguard being 
compared with a group of studies in which some used the safeguard and 
some did not, in an unknown proportion. If the studies that did not use 
the safeguard had higher effect sizes as a result, then including the stud- 
ies that used but did not report the safeguard in the same category will re- 
duce the average effect size in that group, bringing it closer to that of the 
group that clearly used the safeguard. Clearly, this would reduce the sen- 
sitivity of a test comparing the mean effect sizes in the two groups and 
could obscure a genuine relationship between effect size and method- 
ological quality. The only parapsychological meta-analysis published so 
far that allowed assessors to code the presence of a safeguard in a study as 
unknown rather than merely present or absent found that up to 59% of 
studies fell into this category on certain safeguards (Steinkamp, Milton, 
& Morris, 1998) , suggesting that the problem may be by no means trivial 
in other parapsychology databases. 

4. The binary quality ratings used in parapsychological meta-analyses 
may also lead to insensitive quality analyses because they are crude mea- 
sures of quality, whereas the seriousness of a flaw may often vary more 
smoothly than this in magnitude. For example, the use of card shuffling 
to randomize the target sequence in an ESP study would count as a 
flaw in most parapsychological meta-analyses; but, because randomness 
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improves as the number of shuffles increases, a study in which the deck 
was shuffled a lot would be less prone to error than a study in which the 
deck was shuffled only a few times. Analyses based on the usual binary 
flaw ratings may be too insensitive to pick up a relationship between flaws 
and effect size (Stanford & Stein, 1994). 

5. Experimenters who obtain null results in their studies may give 
shorter accounts of them, leaving out details of the safeguards that they 
included (as Pratt, 1966, states that he did, for example). In a 
meta-analysis, such studies as a group would show a spurious association 
between low effect size and low quality; thus hiding, perhaps, a real associ- 
ation between low effect size and high quality in the other studies in the 
database (Milton, 1997). 

6. Quality coding has almost always been conducted non-blind in 
parapsychology meta-analyses; so, it is difficult to rule out the possibility 
of coders being influenced in their coding by the studies’ outcome. 
Coders who favor the psi hypothesis might be reluctant to ascribe flaws to 
successful studies or, conversely, might overcompensate for their bias by 
being more ready to penalize unsuccessful studies. Either strategy would 
introduce error variance. 

7. Flaws might not behave additively but might instead interact with 
each other, reducing the sensitivity of simple contrast or correlation anal- 
yses that examine the relationship between total flaws and effect size 
(Stanford & Stein, 1994). Similarly, the relationship between the lack of 
any given safeguard and effect size might not be linear; a flaw may only 
become “active” above a certain threshold, for example, and, again, a 
simple correlative approach would be insensitive to this (Stanford & 
Stein, 1994). 

8. If the presence of some flaws is negatively correlated, they might 
raise effect sizes in the database, but their effects would be difficult to de- 
tect. A database in which either safeguard A or safeguard B is present in 
each study, but never both together, serves as an extreme example to illus- 
trate the point. If the absence of each safeguard increases effect size to 
roughly the same degree, then a comparison of effect sizes of studies in 
which safeguard A is present with studies in which it is absent will show no 
difference; nor will such a comparison show any difference when applied 
to safeguard B (Hyman, 1985). 

There are plenty of reasons, then, for being cautious about conclud- 
ing that methodological flaws do not increase study outcomes because es- 
timates of studies’ overall methodological quality are not statistically sig- 
nificantly correlated with effect size. Moreover, if the effects of flaws 
cannot be ruled out, then the other aspects of the meta-analyses’ results 
that appear to support the psi hypothesis — that is, implausibly large “file 
drawers” of unpublished, null studies required to render the overall 
cumulation nonsignificant, and replicability across experimenters — also 
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are undermined. If study outcomes in a meta-analysis have been inflated 
by flaws, then so has the size of the “file drawer.” If it were possible to cor- 
rect study outcomes for the influence of those flaws, the overall 
cumulation would fall and the file drawer would, perhaps, no longer 
appear unreasonably large. Concerning replicability, all of the 
parapsychological databases that have examined it have shown statisti- 
cally significant heterogeneity of effect size across studies (Honorton & 
Ferrari, 1989; Honorton, Ferrari & Bern, 1998; Milton, 1997; Radin & 
Ferrari, 1991; Radin & Nelson, 1989; Stanford & Stein, 1994), with the ex- 
ception of the PRL ganzfeld database (Honorton et al., 1990). The 
replicability that many of them claim is replicability of successful rejec- 
tion of the null hypothesis, using a variety of methods. Honorton and 
Ferrari (1989), for example, report that 30% of studies and 37% of ex- 
perimenters obtained statistically significant results, indicating that 
more successful outcomes were obtained than the 5% expected by 
chance, and that success was not restricted to a few experimenters. 
Clearly, replicability defined in these terms is also vulnerable to explana- 
tion in terms of methodological artifacts in databases in which quality is 
unclear. 

There is, however, a second type of evidence for psi that is often 
mentioned in addition to the results of proof-oriented meta-analyses, 
and that is that a number of literature reviews and meta-analyses of 
process-oriented psi research appear to indicate consistent relation- 
ships between study outcomes and variables such as belief in psi, 
extraversion, and so on. It appears to be often assumed that such relation- 
ships would not be consistent if they were attributable to methodological 
flaws. For example, it may be assumed that the “sheep-goat” effect, in 
which believers in psi score higher on psi tasks than nonbelievers, cannot 
be due to sensory cues because these cues would be equally available to 
both sheep and goats, and both groups would be expected to show the 
same level of performance. 

This is not a safe assumption, however. There are many situations in 
which one might expect the action of flaws to produce consistent differ- 
ences between groups, in line with parapsychologists’ hypotheses. For ex- 
ample, in sheep-goat studies that do not have adequate sensory shielding, 
participants might be expected to be motivated to exploit those sensory 
cues (consciously or otherwise) to perform in accordance with their be- 
liefs, just as they are hypothesized to do with extrasensory cues — sheep to 
score more hits and goats to score fewer hits. The pattern of the results 
due to the inadequate sensory shielding would mimic that expected un- 
der the usual sheep-goat hypothesis. As Palmer (1978) notes, the results 
of ESP experiments tend to fall into patterns that make psychological 
sense, inasmuch as they appear similar to the patterns of results that one 
might expect if subjects were attempting to respond to very weak sensory 
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information. Many spurious results due to flaws would also be expected 
to make similar sense, however, especially if in fact they were based on 
sensory leakage (see also Wiseman & Morris, 1995). Many of the more 
consistent findings of ESP research (such as higher scoring on confi- 
dence calls than on other trials [Carpenter, 1977; Palmer, 1978] , higher 
scoring in studies with trial-by-trial feedback [Honorton & Ferrari, 
1989], and so on) make conventional psychological sense if one as- 
sumes that they are due to the exploitation by participants of weak- 
nesses in the design. Psychologically meaningful and consistent pat- 
terns of results would also be expected if safeguards preventing 
experimenter bias (such as predetermination of study sizes, 
prespecification of statistical tests, data checking and so on) were lack- 
ing. Arguing that process-oriented research has shown a consistent and 
meaningful pattern of results does not, therefore, allow side-stepping of 
the question of methodological quality if this argument is to be used in a 
proof-oriented way. Furthermore, it is difficult to make a strong 
proof-oriented case on the basis of this process-oriented work because 
meta-analyses of studies examining relationships between apparent ESP 
performance and moderator variables indicate similar problems of low 
or unclear quality in studies as are found in the proof-oriented 
meta-analyses (Honorton, Ferrari & Bern, 1998; Lawrence, 1993; Stan- 
ford & Stein, 1994). 

I am not arguing that methodological problems clearly account for 
the positive results of the parapsychological meta-analyses. The study 
quality estimates that the meta-analyses report is in most cases mini- 
mum estimates of quality because they conservatively do not give the 
benefit of the doubt to studies that do not report details of safeguards; 
the actual quality of the studies may have been higher than it appears. 
The general absence of demonstrable relationships between studies’ 
quality estimates and their effect sizes is encouraging for the psi hypoth- 
esis, if not a matter for complacency. Nor is it my intention to discour- 
age the use of meta-analysis as a valuable tool because it cannot answer 
all of the questions that we would want to ask about a database. It is 
clearly a more powerful method than traditional literature reviews for 
synthesizing research findings; however, there appear to be potentially 
serious problems with drawing strong conclusions from reviews and 
meta-analyses of studies that are not demonstrably strong in quality, and 
these problems apply as much to process-oriented research as they do to 
proof-oriented research. If providing strong evidence for psi is still seen 
as important, then it appears that the only way to do so is by demonstrat- 
ing a replicable, nonzero effect across a range of experimenters under 
stringent methodological conditions. So far, this does not appear to 
have happened. 
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Implications for future research 

Ganzfeld research seems an obvious area in which to continue to 
look for strong evidence for psi. No other research methodology in para- 
psychology has received the detailed critical attention that ganzfeld re- 
search has received; it is the only area in which a whole database of stud- 
ies has been examined intensively by a researcher such as Hyman who 
considers the existence of a genuine anomaly unlikely (Hyman, 1985) 
and in which researchers with opposing viewpoints have jointly produced 
methodological guidelines for research to settle the question of the exis- 
tence of psi (Hyman & Honorton, 1986). In addition, it has arguably 
come to represent the case for psi in microcosm for mainstream psychol- 
ogy (Bern & Honorton, 1994; Milton & Wiseman, 1999a) , and an account 
of it appears in every major summary of parapsychology’s best evidence 
for psi (e.g., Atkinson, Atkinson, Smith, & Bern, 1990; Broughton, 1992, 
Hayes, 1998; Krippner et al„ 1993; Radin, 1997; Utts, 1991). The failure 
of the recent studies to replicate the success of the earlier work therefore 
presents a challenge in the same mainstream scientific forum to parapsy- 
chology’s claims for a genuine, replicable effect. 

If ganzfeld research is to be an important player in the continued 
search for strong evidence, that search will only be successful if a 
replicable effect can be demonstrated. At present, however, if there is no 
change in the way ganzfeld research is carried out and no change in how 
replicability is examined, there appears to be no obvious reason why the 
next, inevitable meta-analysis of future ganzfeld studies will not show the 
same pattern of a null, or near-null, cumulation with perhaps a few indi- 
vidual experimenters obtaining effects that others are not replicating. In 
order to avoid repeating recent history, we need to know why the recent 
meta-analysis (Milton & Wiseman, 1999a) failed to replicate the findings 
of the PRL studies, which were carried out under similarly stringent 
conditions. 

Unfortunately, the explanation is far from clear. One possible reason 
could be that the results of earlier ganzfeld studies were due to method- 
ological problems rather than to psi; however, although a number of po- 
tential avenues for sensory leakage have been identified in the PRL work 
(Honorton et al., 1990; Morris, Cunningham, McAlpine, & Taylor, 1993; 
Wiseman, Smith, & Kombrot, 1996), none appear sufficiently strong to 
account in any obvious way for the success of those studies, which were 
much more well-controlled than the earlier work (Milton & Wiseman, 
1999a). 

Another possibility is that the PRL studies used psi-conducive proce- 
dures but that the recent studies did not. This is possible but far from cer- 
tain, for two reasons. First, although Bern and Honorton (1994) identi- 
fied a number of variables that may be important for replication, the vast 
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majority of recent studies meta-analyzed (Milton & Wiseman, 1999a) ei- 
ther did not measure or did not report the average values of these vari- 
ables in their studies (with the exception of the use of static versus dy- 
namic targets where it is clear that the two databases are closely 
matched) ; and so it is not possible to make a strong case that differences 
in these variables accounted for the lack of replication (Milton & Wise- 
man, 1999a). 

It is also possible that any number of additional, unidentified vari- 
ables might have contributed to the success of the PRL studies; and so it is 
not possible to know whether the recent studies’ failure to replicate the 
PRL work was due to their failure to exploit these variables to the same 
extent. There were a number of procedures used on all or almost all trials 
at PRL — the use of a sender, continuous auditory monitoring of the re- 
ceiver’s mentation by the sender, correspondence judging by the receiver 
rather than by an independent judge, (double-blind) prompting by the 
experimenter during the judging to correspondences that the receiver 
overlooked, a 14-minute pretrial relaxation procedure for both sender 
and receiver, and so on. The importance of these procedures has not 
been empirically determined. Any one or more of these procedures 
might be important for replication; however, without any evidence for 
their effects, it is not clear that the failure of the recent studies to repli- 
cate the findings of the PRL studies was due to the use of different proce- 
dures. It is not evident, at this point, what a replication of the PRL work in 
its essentials would have to consist of. 

Since the convention presentation of our meta-analysis (Milton & 
Wiseman, 1997a) , a number of colleagues have informally suggested that 
if we had restricted our database to “standard” ganzfeld studies (i.e., stud- 
ies without unusual features) across-experimenter replication of the PRL 
effect size might have been evident. However, among the researchers 
who have discussed the issue with me there appears to be little agreement 
about the features of a standard ganzfeld study. Devising a rule to define 
such a study at this point could easily appear as a post hoc attempt to ex- 
plain away a disappointing result, given that the previous ganzfeld 
meta-analyses included almost all studies and trials no matter how un- 
usual their procedures (Bern & Honorton, 1994; Honorton, 1985; 
Hyman, 1985) and regardless of whether those procedures would be ex- 
pected to result in success or failure. 2 Neither Hyman and Honorton 
(1986) nor Bern and Honorton (1994) specified that studies would have 
to have certain features to be considered part of the replicability test that 
they proposed. It does not appear possible to selectively meta-analyze the 
recent studies and make a strong case that the ganzfeld effect is 
replicable; however, a selective meta-analysis with exclusion criteria 
stated in advance of studies being conducted would be a credible demon- 
stration of replicability if it obtained positive results. In practice, it is 
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unlikely that criteria could be set up that would anticipate all of the novel 
features that experimenters might introduce in their studies that would 
lead most researchers to expect them to be unsuccessful. In addition to 
having to conform to a basic set of criteria, the procedures planned for 
each study would therefore also have to be examined on a case-by-case ba- 
sis to determine whether or not the study ought to be included in the rep- 
lication test. The existence of such a project would neither affect the 
usual conduct of process-oriented research nor force experimenters to 
use certain procedures in their studies. It would simply be the case that 
studies eligible to be included in the meta-analysis would be included and 
others would not. Similarly, the project would not affect anyone’s usual 
freedom to conduct a meta-analysis of their own. In particular, there is no 
reason anyone should not conduct a process-oriented meta-analysis in- 
volving all studies. 

Some researchers may believe that it is already possible to identify 
successful ganzfeld studies based on their procedures alone, and that it 
would be advisable to begin such a meta-analysis now. Others may think 
this premature. Very few variables have been explored repeatedly or sys- 
tematically in ganzfeld studies, and even fewer have been examined 
meta-analytically across studies to determine whether there is good statis- 
tical evidence that they relate to effect size. Meta-analytic investigation of 
some of the variables suggested by Bern and Honorton (1994) as having 
been important in the PRL work indicates that other experimenters have 
not replicated their effects in the few areas where this has been attempted 
(Milton & Wiseman, 1999a). In addition, some variables identified by 
Bern and Honorton as having had statistically significant relationships 
with effect size in the PRL studies do not in fact appear to have done so 
(Milton & Wiseman, 1999a) , suggesting that our success so far in identify- 
ing what variables are important in the ganzfeld might be more limited 


2 The previous ganzfeld meta-analyses did not report explicit exclusion rules but the 
implicit rules appear to have been to include every ganzfeld study (for Hyman’s 
meta-analysis) or every single trial (for the PRL meta-analysis) in which a ganzfeld environ- 
ment (even a modified one) was used to conduct an ESP test, with one disputed exception. 
For the first meta-analysis of ganzfeld studies, Honorton provided Hyman with “a copy of ev- 
ery ganzfeld study known to him” (Hyman, 1985, p. 4) , all of which Hyman included in his 
meta-analysis. The studies were procedurally very varied, with some having features that lab- 
oratory lore might predict would not be psi-conducive, such as very short mentation periods 
(e.g. Rogo et al., 1976) ; however, Honorton did exclude two conditions in a study by Rabum 
( 1 975) in which participants were not aware that they were taking part in an ESP test, on the 
grounds that these trials were too atypical of other ganzfeld research. Hyman (1985) ob- 
jected to their exclusion because other studies contained unique features and yet were in- 
cluded in the database. Bern and Honorton’s (1994) subsequent meta-analysis of the PRL 
work included every single trial done using the autoganzfeld. The PRL studies were also 
procedurally varied and the meta-analysis included trials that, again, might arguably not be 
expected to be successful, such as demonstration trials carried out in the presence of a TV 
crew and trials from Series 302 in which Target 79 was included in the target set on each trial 
despite its never having been previously correctly identified when serving as the target. 
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than has been assumed. Before embarking upon a replication test that 
should exploit its findings, it may be that a systematic assessment of pro- 
cess-oriented ganzfeld research is called for (e.g., see Dalton, 1997b). 

Summary and conclusion 

The meta-analysis of recent, well-controlled ganzfeld studies (Milton 
& Wiseman, 1999a) indicates a failure to replicate the results of the earlier 
work, and the evidence for psi from meta-analyses and process-oriented re- 
views of parapsychology studies of low or uncertain quality does not ap- 
pear compelling. If the search for strong evidence for psi is to continue, 
ganzfeld research appears to be its natural arena. A meta-analysis that ex- 
cludes studies before they are conducted if they are not expected to repli- 
cate a positive effect appears to be the obvious test of future replication. 
Until more research has been done to identify what factors may be psi 
conducive in the ganzfeld, such a meta-analysis may be premature, but it 
appears to be an important goal to work towards. 

Many researchers may disagree with my assessment of the evidence 
for psi accumulated so far, and with my goal of continuing to seek stron- 
ger evidence in general, and with my proposal for a prospective 
ganzfeld meta-analysis in particular. Conversely, many may disagree 
with the use of meta-analyses of studies of uncertain quality being pro- 
moted as strong evidence for psi, and with ganzfeld research having be- 
come a crucial test case before the factors that affect its replicability have 
been well-established. Whatever researchers’ views may be, however, the 
momentum of previous events is carrying the field towards another inclu- 
sive meta-analysis of future ganzfeld studies that appears likely to show 
the same failure to replicate as did the last one. Should a second failure to 
replicate occur despite the warning of a first failure, it will give the ap- 
pearance of reasonably strong evidence against claims for psi as a 
replicable (and therefore, probably genuine) effect. 

If this is not a direction that parapsychologists want events to take, 
then now appears to be the time to say so. Although the choice of 
whether to carry out a meta-analysis is likely to be an individual one, its re- 
sults will affect other researchers. The opportunity for the research com- 
munity, rather than a few, key individuals, to discuss the issues and ex- 
press their opinions is long overdue. I look forward to hearing the views 
of my colleagues on the matters that I have discussed in this paper. 
Organization of an electronic mail discussion 

The apparent, replication problems in ganzfeld research described 
in the preceding paper appeared to require discussion among the 
ganzfeld research community in order to determine what, if any, course 
of action seemed appropriate and could be agreed upon. I, therefore, 
invited a group of researchers with expertise in ganzfeld research and 
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parapsychological meta-analysis to discuss the future of ganzfeld research 
in a three-week electronic mail conference in May 1999. 

The invitees consisted of authors and coauthors of studies conducted 
since publication of the Hyman-Honorton guidelines; Honorton’s 
autoganzfeld research team; senior authors of at least two ganzfeld stud- 
ies conducted at any time; meta-analysts of ganzfeld research and of 
other proof-oriented meta-analyses; authors of published commentary 
on proof-oriented aspects of the ganzfeld meta-analyses; and, in order to 
include future ganzfeld experimenters, researchers planning to conduct 
a formal ganzfeld study within the next two years. 

Every effort was made to identify and locate eligible participants. Re- 
searchers planning to conduct ganzfeld studies within two years were 
sought via messages on the two main parapsychology electronic 
mailbases (PRF and PDL) . For participants eligible through previous au- 
thorship, contact details were sought from these mailbases, the 
Parapsychological Association, alumni offices in UK universities (for 
those who had conducted research while students), former colleagues 
and co-authors, internet directory searches, and other invitees. Out of 65 
eligible participants, 58 were successfully traced. Forty-one invitees 
(71%) accepted the invitation to join the mailbase during its operation. 
They are listed in Appendix B. Each received an advance copy of the pre- 
ceding discussion paper and a preview copy of the Milton and Wiseman 
(1999a) ganzfeld meta-analysis paper, then in press with Psychological 
Bulletin. 

Because of the importance of the issues under discussion, John 
Palmer, the editor of this journal, agreed in advance to publish a tran- 
script of the debate. Participants were informed of this at the time of their 
invitation. They were also told that the transcript would be edited for 
length and re-ordered if necessary by an independent editor, Dr. Ger- 
trude Schmeidler. They were informed that any editing would be agreed 
with each message’s author before publication and that, to avoid bias, no 
substantive content would be removed. Participants were assured that in 
the interests of neutrality I would have no involvement in this editing and 
that John Palmer, himself a debate participant, would restrict himself to 
approving the edited material’s length and would have no influence on 
the nature of its content. 

The debate format had some unusual features intended to foster pro- 
ductive discussion. Participants were informed that there would be a 
strict policy of courtesy among discussants. In addition, so that argu- 
ments would be assessed on their merit rather than on their author’s sta- 
tus, each author’s identification and e-mail address were removed by a 
computer program en route to the mailbase and each message was only 
identified by a number. Authors could, however, reveal their identities in 
any particular message if this was necessary to make it clear that they 
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spoke with authority on a question of fact (for example, in discussing un- 
published data from their own research) . Otherwise, participants were 
asked to help conceal their identities by wording messages in ways that 
would not reveal who they were. Participants were informed that the 
identity of each message’s author would be announced after the discus- 
sion had closed and would be published with the debate transcript. 

In order to ensure compliance with the rules of the discussion, a 
moderator. Professor Hoyt Edge, screened each message for anonymity 
and courtesy, with a remit to negotiate if necessary an acceptable wording 
before posting the message on to the other participants. Participants 
were informed that I would have no involvement in the moderating pro- 
cess, again in the interests of neutrality. 

All members of the discussion group received an optional question- 
naire before and after the discussion asking their opinions on the main is- 
sues, and a post-discussion questionnaire concerning their satisfaction 
with the organizational features of the debate. The questionnaire data 
are presented in Appendix C. 

The edited debate material follows in Part II. Each message in the 
transcript is numbered, with its author listed in an appendix so that read- 
ers may, if they wish, have the same experience as the discussants of read- 
ing the material without knowing who wrote it, allowing themselves only 
to be swayed by the force of argument and evidence. 
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Appendix A 
Table A1 


Ganzfeld Studies Published to Date (March 1999) since 
Completion of Milton & Wiseman (1999a) Meta- analysis 
(February 1997) 


Study (N = 12) Number of trials 

z 

z/N 1/z 

Dalton (1997a) 

128 

5.26 

.46 

Parker & Westerlund 

30 

2.40 

.44 

(1998) Study IV 
Parker & Westerlund 

30 

1.25 

.23 

(1998) Study V 
Parker & Westerlund 

30 

_a 

a 

(1998) Serial Ganzfeld 
Symmons & Morris 

12 

b,c 

b,c 

(1997) Pilot Study 
Symmons & Morris 
(1997) Main Study 
Wezelman & Bierman 

51 

2.98 b 

.42 b 

32 

-1.48 

-.26 

(1997) Amsterdam Series IV B 
Wezelman & Bierman 

40 

-.91 

-.14 

(1997) Amsterdam Series V 
Wezelman & Bierman 

40 

-.15 

-.2 

(1997) Amsterdam Series VI 
Wezelman & Bierman 

7 

-1.11 

-.42 

(1997) Amsterdam Series VI 
Exploratory Meditation Trials 
Wezelman & Bierman 

12 

d 

d 

(1997) Amsterdam Series VI 
Exploratory Psilocybine Trials 
Wezerman et al. (1997) 

32 

2.15 

.38 


a In this study, the receiver’s task was to place the four targets in the judging set in the order in 
which they had been presented during the ganzfeld session. The authors present the results 
as a frequency table of the number of correct placements within each trial. By inspection the 
outcome is slightly below chance; however, the authors do not present or refer to any specific 
inferential statistical analysis and, because it is not clear what analysis was intended, no post 
hoc analysis has been imposed here. 

b In both studies by Symmons and Morris, tapes of drumming at different frequencies were 
used instead of white noise, and so it is questionable whether they can be considered as using 
a ganzfeld environment. The studies are included here to make clear the effects on the data- 
base of including or excluding them. 
c No outcome was reported for the pilot trials. 

d Two receivers guessed at the same target on 6 trials, obtaining 7 hits in the resulting 12 trials; 
however, data are not presented that would allow for correction of the nonindependence of 
their calls (the “stacking effect": see Milton & Wiseman, 1999b), and so no outcome is pre- 
sented here. 
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Table A2 

Post HOC Comparisons between Mean Effect Sizes in 
Meta- analyses of Recent and Earlier Ganzfeld Studies 


Databases compared 
Honorton (1985) vs.: 

Milton & Wiseman (1999a) 

All studies 1987 to present 
All studies 1987 to present excl. 

Dalton (1997a) 

Bern & Honorton (1994) vs.: 

Milton & Wiseman (1999a) 

All studies 1987 to present 
All studies 1987 to present excl. 

Dalton (1997a) 


t 

d.f. 

/?(one-tailed) 

3.06 

56 

.0017 

2.90 

65 

.0026 

3.04 

64 

.0017 

2.64 

39 

.0059 

2.22 

48 

.016 

2.38 

47 

.011 


Appendix B 

Members of the Discussion Group 

Members of the discussion group, in alphabetical order, were as fol- 
lows (those who posted messages are marked with an asterisk) : Cheryl Al- 
exander, Daryl Bern, Dick Bierman, Douwe Bosga, William Braud, 
Kathy Dalton, Deborah Delanoy, Norman Don, Ricardo Eppinger, Hans 
Gerding,* Gerd Hovelmann, Anjum Khilji, Diana Kombrot, Tony Law- 
rence, Bruce McDonough, Stuart Menzies, Julie Milton, Bob Morris, 
Roger Nelson,* John Palmer, Adrian Parker,* Dean Radin,’ Chris Roe, 
Ephraim Schechter,* Marilyn Schlitz, Fabio da Silva, Matthew Smith, Rex 
Stanford, Fiona Steinkamp, Charles Symmons, James Terry, Jessica Utts, 
Mario Varvoglis, Charles Warren, Caroline Watt,* Joakim Westerlund, 
Rens Wezelman,* Nils Wiklund,* Carl Williams, Melvyn Willin,* Richard 
Wiseman. 


Discussion Paper 


331 


Appendix C 

Questionnaire Data 

As noted earlier, all members of the mailbase group were sent an op- 
tional pre- and postdiscussion questionnaire concerning the main issues, 
and a postdiscussion questionnaire asking about their satisfaction with 
the organizational features of the debate. To minimize response bias, dis- 
cussants were asked to send their responses for compilation to the mod- 
erator, who would keep their individual replies permanently confidential 
from me. 

Pre- and Postdiscussion Opinions on the Main Issues 

The results of the questionnaires are summarized in Table 1. Just un- 
der half of the mailbase members answered the pre- and postdiscussion 
questionnaires, and so it is not clear that the results proportionately re- 
flect the views of whole group. Respondents were not asked to give their 
identities, to maximize response rates. It is, therefore, not clear whether 
any change in opinion reflects a change in the opinion of broadly the 
same group of people, or a change in the identities of those responding 
to the questionnaire. The data can only be interpreted as reflecting the 
views of those who chose to express an opinion at the time. 

Bearing these limitations in mind, it can be seen that respondents ap- 
peared to maintain their position of tending to favor (but with some un- 
certainty) the view that the experimental evidence for psi as a genuine 
anomaly is strong enough to convince a neutral scientist. Respondents 
tended to agree before the discussion that ganzfeld research should con- 
tinue as an important focus for psi as a genuine effect, replicable across 
experimenters under certain conditions; and they agreed more strongly 
with this view after the discussion. There was little change in respondents’ 
view that meta-analyses of stringently conducted studies are important as 
part of the case for psi as a replicable, genuine anomaly, nor in their view 
that it is necessary to plan exclusions in advance rather than post hoc in 
the next ganzfeld meta-analysis. Before the debate, respondents had a 
slight tendency to believe on balance that it is already possible to identify 
successful ganzfeld studies reasonably reliably in advance on the basis of 
their procedures; but afterwards the majority did not think this possible. 
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Table 3 

Opinions on the Main Discussion Issues Before and After Debate 




Percent Agreement | 

Question 

Response 

Entry 

Exit 



Poll 

Poll 



(N = 16) a 

(N = 18) b 

1. Do you think that the ex- 

Yes, certainly 

13 

6 

perimental evidence for psi is 

Yes, on balance 

50 

44 

strong enough that a neutral 


31 

39 

scientist should be convinced 




that a genuine anomaly has 

No, on balance 

0 

0 

been demonstrated, that is, 

No, certainly not 

6 

11 

that there is a phenomenon 




not explicable in terms of er- 




ror, selective reporting, fraud, 




ordinary sensorimotor effects 




and so on? 




2. Do you think that ganzfeld 

I do not believe 



research should remain an 

that further test- 



important focus for testing 

ing of this hypoth- 



the hypothesis that, at least 

esis is necessary, it 



under certain conditions, psi 

has already been 



is a genuinely anomalous ef- 

sufficiently con- 



fect that can be replicated 

firmed 

13 

6 

across experimenters? 

No, certainly not 

13 

0 


No, on balance 

13 

11 


Uncertain 

19 

11 


Yes, on balance 

19 

61 


Yes, certainly 

25 

11 

3a. How important do you 

Crucial 

13 

12 

think meta-analyses of strin- 

Important 

63 

59 

gently conducted parapsychol- 

Uncertain 

13 

6 

ogy studies are in making at 




least part of the case for psi as 

Not important 

6 

6 

a genuine and replicable 

Irrelevant 

6 

18 

anomaly? 
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3b c . I am proposing that if a 
meta-analysis of ganzfeld stud- 
ies designed to test whether 
psi is a genuine and replicable 
anomaly is to be selective and 
yet still credible, it is necessary 
to identify studies for inclu- 
sion in advance of their con- 
duct on the basis of their 
planned procedures rather 
than excluding studies after 
they have been conducted 
when their results are known. 
Do you agree? 

Yes, certainly 
Yes, on balance 
Uncertain 
No, on balance 
No, certainly not 

29 

36 

7 

21 

7 

40 

33 

13 

13 

0 

4. Do you think that the pro- 

No, certainly not 

13 

28 

cedures necessary for produc- 

No, on balance 

25 

28 

ing a replicable ganzfeld 
effect have been identified to 

Uncertain 

19 

28 

the extent that it would be 

Yes, on balance 

38 

17 

possible now to identify in ad- 
vance which studies are likely 
to be successful with reason- 
able reliability? 

Yes, certainly 

6 

0 


a N = 14 for Question 3b 

b N = 17 for Question 3a and N = 15 for Question 3b 

c This question was only for respondents who answered “crucial" or “important” to Question 
3a. 

Opinions on Discussion Features 

Seventeen members of the discussion group responded to the 
postdiscussion questionnaire asking for their views on various aspects of 
how the discussion was run. Concerning the time allowed for the discus- 
sion, most (65%) thought three weeks to be about right. The remainder 
(35%) would have preferred a longer period (between four and eight 
weeks, according to individual responses) , with none thinking the debate 
too long. Most respondents would have recommended message anonym- 
ity and prearranged publication for future e-mail debates (70% and 69%, 
respectively) , with a moderator to screen for courtesy being very strongly 
favored: 83% of respondents would have recommended this feature for a 
future debate. The present discussion’s moderator was not given the role 
of guiding the discussion, but 59% of respondents recommended such 
guidance for future e-mail debates. 



